Skip to content

fix(router): use max_completion_tokens for OpenAI GPT-5+ validation#575

Open
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
cluster2600:fix/517-max-completion-tokens
Open

fix(router): use max_completion_tokens for OpenAI GPT-5+ validation#575
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
cluster2600:fix/517-max-completion-tokens

Conversation

@cluster2600
Copy link

Summary

Resolves #517openshell inference set fails for OpenAI GPT-5 models because the validation probe sends the deprecated max_tokens parameter, which GPT-5+ rejects with HTTP 400.

  • Send max_completion_tokens as the primary parameter in the OpenAI chat completions validation probe
  • Automatically fall back to max_tokens when the backend returns HTTP 400 (for legacy or self-hosted backends)
  • Extract try_validation_request() helper to avoid duplicating the request/response classification logic

Root Cause

OpenAI introduced max_completion_tokens as a replacement for max_tokens starting with the o1 series. GPT-5 and later models reject max_tokens entirely, returning HTTP 400. The validation probe only sent max_tokens, so inference setup would fail for any GPT-5+ model even though the endpoint was perfectly healthy.

graph TD
    subgraph "Before (broken)"
        A["validation_probe()"] -->|"max_tokens: 32"| B[OpenAI API]
        B -->|"HTTP 400: unsupported parameter"| C["ValidationFailure ❌"]
    end

    subgraph "After (fixed)"
        D["validation_probe()"] -->|"max_completion_tokens: 32"| E[OpenAI API]
        E -->|"HTTP 200"| F["ValidatedEndpoint ✅"]
        E -->|"HTTP 400"| G{fallback_body?}
        G -->|"yes"| H["retry with max_tokens: 32"]
        H -->|"HTTP 200"| I["ValidatedEndpoint ✅"]
        G -->|"no"| J["ValidationFailure ❌"]
    end
Loading

Changes

File Change
crates/openshell-router/src/backend.rs Add fallback_body field to ValidationProbe; update openai_chat_completions probe to use max_completion_tokens with max_tokens fallback; extract try_validation_request() helper; add 3 new tests
crates/openshell-server/src/inference.rs Update existing test expectation from max_tokens to max_completion_tokens

Test Plan

  • cargo test -p openshell-router — 11 passed, 0 failed
  • New test: verify_openai_chat_uses_max_completion_tokens — primary probe succeeds with max_completion_tokens
  • New test: verify_openai_chat_falls_back_to_max_tokens — HTTP 400 on primary triggers retry with max_tokens
  • New test: verify_non_chat_completions_no_fallback — non-chat protocols (e.g. anthropic_messages) do not retry on 400
sequenceDiagram
    participant CLI as openshell inference set
    participant Router as Privacy Router
    participant Backend as OpenAI API

    CLI->>Router: verify_backend_endpoint()
    Router->>Backend: POST /v1/chat/completions<br/>{"max_completion_tokens": 32}

    alt GPT-5+ model
        Backend->>Router: HTTP 200
        Router->>CLI: ValidatedEndpoint ✅
    else Legacy backend
        Backend->>Router: HTTP 400 (unknown param)
        Router->>Backend: POST /v1/chat/completions<br/>{"max_tokens": 32}
        Backend->>Router: HTTP 200
        Router->>CLI: ValidatedEndpoint ✅
    end
Loading

@cluster2600 cluster2600 requested a review from a team as a code owner March 24, 2026 20:57
@github-actions
Copy link

github-actions bot commented Mar 24, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@github-actions
Copy link

Thank you for your interest in contributing to OpenShell, @cluster2600.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

  1. Open a Vouch Request discussion.
  2. Describe what you want to change and why.
  3. Write in your own words — do not have an AI generate the request.
  4. A maintainer will comment /vouch if approved.
  5. Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

@github-actions github-actions bot closed this Mar 24, 2026
@pimlock pimlock reopened this Mar 25, 2026
…robe

OpenAI GPT-5 models reject the legacy max_tokens parameter and require
max_completion_tokens. The inference validation probe now sends
max_completion_tokens as the primary parameter, with an automatic
fallback to max_tokens when the backend returns HTTP 400 (for
legacy/self-hosted backends that only support the older parameter).

Closes NVIDIA#517

Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@cluster2600 cluster2600 force-pushed the fix/517-max-completion-tokens branch from 3c89e9b to 44217f7 Compare March 25, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI GPT-5 verification fails in openshell inference set due to max_tokens request parameter

2 participants